EN FR
EN FR


Section: Bilateral Contracts and Grants with Industry

National Contracts

ADT JSnoori

JSnoori ADT (2011-2012) is dedicated to porting main functions of WinSnoori in Java and the integration of new facilities targeting language learning. The main objective is to offer functions enabling the development of feedback for foreign language learning and more precisely the mastery of prosody.

This year the architecture has been changed to comply to the MVC (Model View Controller) model. This makes the management of interactions easier and this clearly separates speech processing algorithms from interactions. In addition forced alignment facilities and phonetic edition tools have been integrated for French and English. They enable the segmentation of sentences uttered by learners, and the annotation with international phonetic alphabet (IPS).

Preliminary versions of diagnosis and feedback of prosody have been incorporated for English (see 6.1.6.1 ).

ANR ARTIS

This contract started in January 2009 in collaboration with LTCI (Paris), Gipsa-Lab (Grenoble) and IRIT (Toulouse). Its main purpose is the acoustic-to-articulatory inversion of speech signals. Unlike the European project ASPI the approach followed in our group will focus on the use of standard spectra input data, i.e. cepstral vectors. The objective of the project is to develop a demonstrator enabling inversion of speech signals in the domain of second language learning.

This year the work has focused on the development of the inversion from cepstral data as input. We particularly worked on the comparison of cepstral vectors calculated on natural speech and those obtained via the articulatory to acoustic mapping. Bilinear frequency warping was combined with affine adaptation of cepstral coefficients. These two adaptation strategies enable a very good recovery of vocal tract shapes from natural speech. The second topic studied is the access to the codebook. Two pruning strategies, a simple one using the spectral peak corresponding to F2 and a more elaborated one exploiting lax dynamic programming applied on spectral peaks enable a very efficient access to the articulatory codebook used for inversion.

ANR ViSAC

This ANR Jeunes Chercheurs started in 2009, in collaboration with Magrit group. The main purpose of ViSAC (Acoustic-Visual Speech Synthesis by Bimodal Unit Concatenation) is to propose a new approach of a text-to-acoustic-visual speech synthesis which is able to animate a 3D talking head and to provide the associated acoustic speech. The major originality of this work is to consider the speech signal as bimodal (composed of two channels acoustic and visual) "viewed" from either facet visual or acoustic. The key advantage is to guarantee that the redundancy of two facets of speech, acknowledged as determining perceptive factor, is preserved.

Currently, we designed a complete system of the text to acoustic-visual speech synthesis based on a relatively small corpus. The system is using bimodal diphones (an acoustic component and a visual one) and it is using unit selection techniques. Although the database for the synthesis is small, however the first results seem to be very promising. The developed system can be used with a larger corpus. We are trying to acquire/analyze an 1-2 hours of audiovisual speech.

Currently, we are mainly evaluating the system using both subjective and objective perceptual evaluation.